20 research outputs found

    Query expansion with terms selected using lexical cohesion analysis of documents

    Get PDF
    Cataloged from PDF version of article.We present new methods of query expansion using terms that form lexical cohesive links between the contexts of distinct query terms in documents (i.e., words surrounding the query terms in text). The link-forming terms (link-terms) and short snippets of text surrounding them are evaluated in both interactive and automatic query expansion (QE). We explore the effectiveness of snippets in providing context in interactive query expansion, compare query expansion from snippets vs. whole documents, and query expansion following snippet selection vs. full document relevance judgements. The evaluation, conducted on the HARD track data of TREC 2005, suggests that there are considerable advantages in using link-terms and their surrounding short text snippets in QE compared to terms selected from full-texts of documents. (C) 2006 Elsevier Ltd. All rights reserved

    On document relevance and lexical cohesion between query terms

    Get PDF
    Lexical cohesion is a property of text, achieved through lexical-semantic relations between words in text. Most information retrieval systems make use of lexical relations in text only to a limited extent. In this paper we empirically investigate whether the degree of lexical cohesion between the contexts of query terms' occurrences in a document is related to its relevance to the query. Lexical cohesion between distinct query terms in a document is estimated on the basis of the lexical-semantic relations (repetition, synonymy, hyponymy and sibling) that exist between there collocates - words that co-occur with them in the same windows of text. Experiments suggest significant differences between the lexical cohesion in relevant and non-relevant document sets exist. A document ranking method based on lexical cohesion shows some performance improvements. © 2006 Elsevier Ltd. All rights reserved

    A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries

    Full text link
    Interactive query expansion can assist users during their query formulation process. We conducted a user study with over 4,000 unique visitors and four different design approaches for a search term suggestion service. As a basis for our evaluation we have implemented services which use three different vocabularies: (1) user search terms, (2) terms from a terminology service and (3) thesaurus terms. Additionally, we have created a new combined service which utilizes thesaurus term and terms from a domain-specific search term re-commender. Our results show that the thesaurus-based method clearly is used more often compared to the other single-method implementations. We interpret this as a strong indicator that term suggestion mechanisms should be domain-specific to be close to the user terminology. Our novel combined approach which interconnects a thesaurus service with additional statistical relations out-performed all other implementations. All our observations show that domain-specific vocabulary can support the user in finding alternative concepts and formulating queries.Comment: To be published in Proceedings of Theories and Practice in Digital Libraries (TPDL), 201

    Mining document, concept, and term associations for effective biomedical retrieval - Introducing MeSH-enhanced retrieval models

    Get PDF
    Manually assigned subject terms, such as Medical Subject Headings (MeSH) in the health domain, describe the concepts or topics of a document. Existing information retrieval models do not take full advantage of such information. In this paper, we propose two MeSH-enhanced (ME) retrieval models that integrate the concept layer (i.e. MeSH) into the language modeling framework to improve retrieval performance. The new models quantify associations between documents and their assigned concepts to construct conceptual representations for the documents, and mine associations between concepts and terms to construct generative concept models. The two ME models reconstruct two essential estimation processes of the relevance model (Lavrenko and Croft 2001) by incorporating the document-concept and the concept-term associations. More specifically, in Model 1, language models of the pseudo-feedback documents are enriched by their assigned concepts. In Model 2, concepts that are related to users’ queries are first identified, and then used to reweight the pseudo-feedback documents according to the document-concept associations. Experiments carried out on two standard test collections show that the ME models outperformed the query likelihood model, the relevance model (RM3), and an earlier ME model. A detailed case analysis provides insight into how and why the new models improve/worsen retrieval performance. Implications and limitations of the study are discussed. This study provides new ways to formally incorporate semantic annotations, such as subject terms, into retrieval models. The findings of this study suggest that integrating the concept layer into retrieval models can further improve the performance over the current state-of-the-art models.Ye

    LRD: latent relation discovery for vector space expansion and information retrieval

    Get PDF
    In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion
    corecore